ibm research
Toward Continuous Neurocognitive Monitoring: Integrating Speech AI with Relational Graph Transformers for Rare Neurological Diseases
Norel, Raquel, Merler, Michele, Modi, Pavitra
Patients with rare neurological diseases report cognitive symptoms--"brain fog"--invisible to traditional tests. Proof-of-concept in phenylketonuria (PKU) shows speech-derived "Proficiency in Verbal Discourse" correlates Success would transform episodic neurology into continuous personalized monitoring for millions globally. In phenylketonuria (PKU), adults describe "brain fog" and working memory deficits [ We envision smartphone-based speech analysis integrated with medical databases via RELGT, enabling continuous neurocog-nitive monitoring--transforming reactive episodic care into proactive precision neurology. Parkinson's disease involves hypophonia and speech fluctuations tied to medication Huntington's disease reflects CAG-repeat-driven degrneration and progressive motor-cognitive decline. Wilson's disease presents with dysarthria linked to copper accumulation.
- North America > United States (0.06)
- North America > Canada > New Brunswick > Fredericton (0.05)
- North America > Canada > New Brunswick > York County > Fredericton (0.05)
Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models
Candello, Heloisa, Azmat, Muneeza, Gunturi, Uma Sushmitha, Horesh, Raya, de Paula, Rogerio Abreu, Pimentel, Heloisa, Grave, Marcelo Carpinette, Adebiyi, Aminat, Machado, Tiago, de Macedo, Maysa Malfiza Garcia
With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.
- South America > Brazil > São Paulo (0.05)
- North America > Canada (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (11 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Personal > Interview (0.93)
- Information Technology (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Solving Context Window Overflow in AI Agents
Labate, Anton Bulle, de Sousa, Valesca Moura, Fiorini, Sandro Rama, Azevedo, Leonardo Guerreiro, Thiago, Raphael Melo, da Silva, Viviane Torres
Large Language Models (LLMs) have become increasingly capable of interacting with external tools, granting access to specialized knowledge beyond their training data - critical in dynamic, knowledge-intensive domains such as Chemistry and Materials Science. However, large tool outputs can overflow the LLMs' context window, preventing task completion. Existing solutions such as truncation or summarization fail to preserve complete outputs, making them unsuitable for workflows requiring the full data. This work introduces a method that enables LLMs to process and utilize tool responses of arbitrary length without loss of information. By shifting the model's interaction from raw data to memory pointers, the method preserves tool functionality, allows seamless integration into agentic workflows, and reduces token usage and execution time. The proposed method is validated on a real-world Materials Science application that cannot be executed with conventional workflows, and its effectiveness is demonstrated via a comparative analysis where both methods succeed. In this experiment, the proposed approach consumed approximately seven times fewer tokens than the traditional workflow.
- South America > Brazil (0.06)
- North America > United States (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Episodic Memory in Agentic Frameworks: Suggesting Next Tasks
Fiorini, Sandro Rama, Azevedo, Leonardo G., Thiago, Raphael M., de Sousa, Valesca M., Labate, Anton B., da Silva, Viviane Torres
Agentic frameworks powered by Large Language Models (LLMs) can be useful tools in scientific workflows by enabling human-AI co-creation. A key challenge is recommending the next steps during workflow creation without relying solely on LLMs, which risk hallucination and require fine-tuning with scarce proprietary data. We propose an episodic memory architecture that stores and retrieves past workflows to guide agents in suggesting plausible next tasks. By matching current workflows with historical sequences, agents can recommend steps based on prior patterns.
- Europe > Austria > Vienna (0.14)
- South America > Brazil (0.06)
- North America > United States (0.05)
- Europe > Switzerland (0.04)
Cardinality-Regularized Hawkes-Granger Model
This section provides parameter estimation equations in the MM procedure Eq. (13) for the baseline Below, we provide results for the exponential and power distributions. This section describes the details of the experiments. Dense10 data sets and the Python code to generate those as part of the final submission. Due to the stochastic nature, the total number of event instances cannot be controlled. See the attached code for the detail.
Toward Cybersecurity-Expert Small Language Models
Levi, Matan, Ohayon, Daniel, Blobstein, Ariel, Sagi, Ravid, Molloy, Ian, Allouche, Yair
Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal 2.0, a family of cybersecurity-expert small language models (SLMs) ranging from 4B-20B parameters. To train CyberPal 2.0, we generate an enriched chain-of-thought cybersecurity instruction dataset built with our data enrichment and formatting pipeline, SecKnowledge 2.0, which integrates expert-in-the-loop steering of reasoning formats alongside LLM-driven multi-step grounding, yielding higher-fidelity, task-grounded reasoning traces for security tasks. Across diverse cybersecurity benchmarks, CyberPal 2.0 consistently outperforms its baselines and matches or surpasses various open and closed-source frontier models, while remaining a fraction of their size. On core cyber threat intelligence knowledge tasks, our models outperform almost all tested frontier models, ranking second only to Sec-Gemini v1. On core threat-investigation tasks, such as correlating vulnerabilities and bug tickets with weaknesses, our best 20B-parameter model outperforms GPT-4o, o1, o3-mini, and Sec-Gemini v1, ranking first, while our smallest 4B-parameter model ranks second.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Statistical multi-metric evaluation and visualization of LLM system predictive performance
Ackerman, Samuel, Farchi, Eitan, Raz, Orna, Toledo, Assaf
The evaluation of generative or discriminative large language model (LLM)-based systems is often a complex multi-dimensional problem. Typically, a set of system configuration alternatives are evaluated on one or more benchmark datasets, each with one or more evaluation metrics, which may differ between datasets. We often want to evaluate -- with a statistical measure of significance -- whether systems perform differently either on a given dataset according to a single metric, on aggregate across metrics on a dataset, or across datasets. Such evaluations can be done to support decision-making, such as deciding whether a particular system component change (e.g., choice of LLM or hyperparameter values) significantly improves performance over the current system configuration, or, more generally, whether a fixed set of system configurations (e.g., a leaderboard list) have significantly different performances according to metrics of interest. We present a framework implementation that automatically performs the correct statistical tests, properly aggregates the statistical results across metrics and datasets (a nontrivial task), and can visualize the results. The framework is demonstrated on the multi-lingual code generation benchmark CrossCodeEval, for several state-of-the-art LLMs.
- Asia > Middle East > Israel > Haifa District > Haifa (0.05)
- North America > Canada > Ontario > Toronto (0.04)
MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network
Kishimoto, Akihiro, Kajino, Hiroshi, Hirose, Masataka, Fuchiwaki, Junta, Priyadarsini, Indra, Hamada, Lisa, Shinohara, Hajime, Nakano, Daiju, Takeda, Seiji
Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.
Human-AI Co-Creation Approach to Find Forever Chemicals Replacements
Ferreira, Juliana Jansen, Segura, Vinícius, Souza, Joana G. R., Barbosa, Gabriel D. J., Gallas, João, Cerqueira, Renato, Zubarev, Dmitry
Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of subject matter experts to accelerate the material discovery. Our co-creation process starts with the interaction between the subject matter experts and a generative model that can generate new molecule designs. In this position paper, we discuss our hypothesis that these subject matter experts can benefit from a more iterative interaction with the generative model, asking for smaller samples and ``guiding'' the exploration of the discovery space with their knowledge.
- South America > Brazil (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Middle East > Cyprus > Limassol > Limassol (0.04)